aspect model
Probabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two-mode and co-occurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed method is based on a mixture decomposition derived from a latent class model. This results. in a more principled approach which has a solid foundation in statistics. In order to avoid overfitting, we propose a widely applicable generalization of maximum likelihood model fitting by tempered EM.
- North America > United States > California > Alameda County > Berkeley (0.14)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)
Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments
Popescul, Alexandrin, Ungar, Lyle H., Pennock, David M, Lawrence, Steve
Recommender systems leverage product and community information to target products to consumers. Researchers have developed collaborative recommenders, content-based recommenders, and (largely ad-hoc) hybrid systems. We propose a unified probabilistic framework for merging collaborative and content-based recommendations. We extend Hofmann's [1999] aspect model to incorporate three-way co-occurrence data among users, items, and item content. The relative influence of collaboration data versus content data is not imposed as an exogenous parameter, but rather emerges naturally from the given data sources. Global probabilistic models coupled with standard Expectation Maximization (EM) learning algorithms tend to drastically overfit in sparse-data situations, as is typical in recommendation applications. We show that secondary content information can often be used to overcome sparsity. Experiments on data from the ResearchIndex library of Computer Science publications show that appropriate mixture models incorporating secondary data produce significantly better quality recommenders than k-nearest neighbors (k-NN). Global probabilistic models also allow more general inferences than local methods like k-NN.
- Media (0.47)
- Information Technology (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)
Expectation-Propogation for the Generative Aspect Model
Minka, Thomas P., Lafferty, John
The generative aspect model is an extension of the multinomial model for text that allows word probabilities to vary stochastically across documents. Previous results with aspect models have been promising, but hindered by the computational difficulty of carrying out inference and learning. This paper demonstrates that the simple variational methods of Blei et al (2001) can lead to inaccurate inferences and biased learning for the generative aspect model. We develop an alternative approach that leads to higher accuracy at comparable cost. An extension of Expectation-Propagation is used for inference and then embedded in an EM algorithm for learning. Experimental results are presented for both synthetic and real data sets.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- Africa (0.14)
- South America > Colombia (0.04)
- (6 more...)
User-Dependent Aspect Model for Collaborative Activity Recognition
Zheng, Vincent W. (Hong Kong University of Science and Technology) | Yang, Qiang (Hong Kong University of Science and Technology)
Activity recognition aims to discover one or more users’ actions and goals based on sensor readings. In the real world, a single user’s data are often insufficient for training an activity recognition model due to the data sparsity problem. This is especially true when we are interested in obtaining a personalized model. In this paper, we study how to collaboratively use different users’ sensor data to train a model that can provide personalized activity recognition for each user. We propose a user-dependent aspect model for this collaborative activity recognition task. Our model introduces user aspect variables to capture the user grouping information, so that a target user can also benefit from her similar users in the same group to train the recognition model. In this way, we can greatly reduce the need for much valuable and expensive labeled data required in training the recognition model for each user. Our model is also capable of incorporating time information and handling new user in activity recognition. We evaluate our model on a real-world WiFi data set obtained from an indoor environment, and show that the proposed model can outperform several state-of-art baseline algorithms.
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Communications (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Modeling Semantic Question Context for Question Answering
Banerjee, Protima (Drexel University) | Han, Hyoil (Drexel University)
Within a Question Answering (QA) framework, Question Context plays a vital role. We define Question Context to be background knowledge that can be used to represent the user’s information need more completely than the terms in the query alone. This paper proposes a novel approach that uses statistical language modeling techniques to develop a semantic Question Context which we then incorporate into the Information Retrieval (IR) stage of QA. Our approach proposes an Aspect-Based Relevance Language Model as basis of the Question Context Model. This model proposes that the sparse vocabulary of a query can be supplemented with semantic information from concepts (or aspects) related to query terms that already exist within the corpus. We incorporate the Aspect-Based Relevance Language Model into Question Context by first obtaining all of the latent concepts that exist in the corpus for a particular question topic. Then, we derive a likelihood of relevance that relates each Context Term (CT) associated with those aspects to the user’s query. Context Terms from the topics with the highest likelihood of relevance are then incorporated into the query language model based on their relevance score values. We use both query expansion and document model smoothing techniques and evaluate our approach using the traditional recall metric. Our results are promising and show significant improvements recall at low levels of precision using the query expansion method.
- North America > United States > Virginia > Fairfax County > McLean (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > United States > California > Napa County (0.04)
Exploiting Social Annotation for Automatic Resource Discovery
Plangprasopchok, Anon, Lerman, Kristina
Information integration applications, such as mediators or mashups, that require access to information resources currently rely on users manually discovering and integrating them in the application. Manual resource discovery is a slow process, requiring the user to sift through results obtained via keyword-based search. Although search methods have advanced to include evidence from document contents, its metadata and the contents and link structure of the referring pages, they still do not adequately cover information sources -- often called "the hidden Web"-- that dynamically generate documents in response to a query. The recently popular social bookmarking sites, which allow users to annotate and share metadata about various information sources, provide rich evidence for resource discovery. In this paper, we describe a probabilistic model of the user annotation process in a social bookmarking system del.icio.us. We then use the model to automatically find resources relevant to a particular information domain. Our experimental results on data obtained from del.icio.us
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Modeling User Rating Profiles For Collaborative Filtering
In this paper we present a generative latent variable model for rating-based collaborative filtering called the User Rating Profile model (URP). The generative process which underlies URP is designed toproduce complete user rating profiles, an assignment of one rating to each item for each user. Our model represents each user as a mixture of user attitudes, and the mixing proportions are distributed according to a Dirichlet random variable. The rating for each item is generated by selecting a user attitude for the item, and then selecting a rating according to the preference pattern associated withthat attitude. URP is related to several models including a multinomial mixture model, the aspect model [7], and LDA [1], but has clear advantages over each.
- North America > Canada > Ontario > Toronto (0.15)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- North America > United States > Minnesota (0.04)
- Asia > Middle East > Jordan (0.04)
- Media (0.46)
- Banking & Finance > Credit (0.35)
Modeling User Rating Profiles For Collaborative Filtering
In this paper we present a generative latent variable model for rating-based collaborative filtering called the User Rating Profile model (URP). The generative process which underlies URP is designed to produce complete user rating profiles, an assignment of one rating to each item for each user. Our model represents each user as a mixture of user attitudes, and the mixing proportions are distributed according to a Dirichlet random variable. The rating for each item is generated by selecting a user attitude for the item, and then selecting a rating according to the preference pattern associated with that attitude. URP is related to several models including a multinomial mixture model, the aspect model [7], and LDA [1], but has clear advantages over each.
- North America > Canada > Ontario > Toronto (0.15)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- North America > United States > Minnesota (0.04)
- Asia > Middle East > Jordan (0.04)
- Media (0.46)
- Banking & Finance > Credit (0.35)
Modeling User Rating Profiles For Collaborative Filtering
In this paper we present a generative latent variable model for rating-based collaborative filtering called the User Rating Profile model (URP). The generative process which underlies URP is designed to produce complete user rating profiles, an assignment of one rating to each item for each user. Our model represents each user as a mixture of user attitudes, and the mixing proportions are distributed according to a Dirichlet random variable. The rating for each item is generated by selecting a user attitude for the item, and then selecting a rating according to the preference pattern associated with that attitude. URP is related to several models including a multinomial mixture model, the aspect model [7], and LDA [1], but has clear advantages over each.
- North America > Canada > Ontario > Toronto (0.15)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- North America > United States > Minnesota (0.04)
- Asia > Middle East > Jordan (0.04)
- Media (0.46)
- Banking & Finance > Credit (0.35)
- North America > Canada > Ontario > Toronto (0.15)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
- North America > United States > California > San Mateo County > San Mateo (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (0.46)